XML Schema (W3C)

XML Schema (W3C)
Filename extension .xsd
Internet media type application/xml, text/xml
Developed by World Wide Web Consortium
Type of format XML Schema language
Extended from XML
Standard(s)

1.0, Part 1 Structures (Recommendation),
1.0, Part 2 Datatypes (Recommendation),
1.1, Part 1 Structures (Candidate Recommendation),

1.1, Part 2 Datatypes (Candidate Recommendation)

XML Schema, published as a W3C recommendation in May 2001, is one of several XML schema languages. It was the first separate schema language for XML to achieve Recommendation status by the W3C. Because of confusion between XML Schema as a specific W3C specification, and the use of the same term to describe schema languages in general, some parts of the user community referred to this language as WXS, an initialism for W3C XML Schema, while others referred to it as XSD, an initialism for XML Schema Document—a document written in the XML Schema language, typically containing the "xsd" XML namespace prefix and stored with the ".xsd" filename extension. In Version 1.1 (currently in July 2011 a Candidate Recommendation), the W3C has chosen to adopt XSD as the preferred name, and that is the name used in this article.

Like all XML schema languages, XSD can be used to express a set of rules to which an XML document must conform in order to be considered 'valid' according to that schema. However, unlike most other schema languages, XSD was also designed with the intent that determination of a document's validity would produce a collection of information adhering to specific data types. Such a post-validation infoset can be useful in the development of XML document processing software, but the schema language's dependence on specific data types has provoked criticism.

Contents

History

In its appendix of references, the XSD specification acknowledges the influence of DTDs and other early XML schema efforts such as DDML, SOX, XML-Data, and XDR. It has adopted features from each of these proposals but is also a compromise among them. Of those languages, XDR and SOX continued to be used and supported for a while after XML Schema was published. A number of Microsoft products supported XDR until the release of MSXML 6.0 (which dropped XDR in favor of XML Schema) in December 2006.[1] Commerce One, Inc. supported its SOX schema language until declaring bankruptcy in late 2004.

The most obvious features offered in XSD that are not available in XML's native Document Type Definitions (DTDs) are namespace awareness, and datatypes, that is, the ability to define element and attribute content as containing values such as integers and dates rather than arbitrary text.

The XSD 1.0 specification was originally published in 2001, with a second edition following in 2004 to correct large numbers of errors. A sequence of drafts of its successor, XSD 1.1, have been published, the latest being a Candidate Recommendation dated 21 July 2011.

Schemas and Schema Documents

Technically, a schema is an abstract collection of metadata, consisting of a set of schema components: chiefly element and attribute declarations and complex and simple type definitions. These components are usually created by processing a collection of schema documents, which contain the source language definitions of these components. In popular usage, however, a schema document is often referred to as a schema.

Schema documents are organized by namespace: all the named schema components belong to a target namespace, and the target namespace is a property of the schema document as a whole. A schema document may include other schema documents for the same namespace, and may import schema documents for a different namespace.

When an instance document is validated against a schema (a process known as assessment), the schema to be used for validation can either be supplied as a parameter to the validation engine, or it can be referenced directly from the instance document using two special attributes, xsi:schemaLocation and xsi:noNamespaceSchemaLocation. (The latter mechanism requires the client invoking validation to trust the document sufficiently to know that it is being validated against the correct schema. "xsi" is the conventional prefix for the namespace "http://www.w3.org/2001/XMLSchema-instance".)

XML Schema Documents usually have the filename extension ".xsd". A unique Internet Media Type is not yet registered for XSDs, so "application/xml" or "text/xml" should be used, as per RFC 3023.

Data types

Unlike DTDs, an XML Schema allows the content of an element or attribute to be validated against a data type. For example, an attribute might be constrained to hold only a valid date or a decimal number.

XSD provides a set of 19 primitive data types (anyURI, base64Binary, boolean, date, dateTime, decimal, double, duration, float, hexBinary, gDay, gMonth, gMonthDay, gYear, gYearMonth, NOTATION, QName, string, and time). It allows new data types to be constructed from these primitives by three mechanisms:

Twenty-five derived types are defined within the specification itself, and further derived types can be defined by users in their own schemas.

Post-Schema-Validation Infoset

After XML Schema-based validation, it is possible to express an XML document's structure and content in terms of the data model that was implicit during validation. The XML Schema data model includes:

This collection of information is called the Post-Schema-Validation Infoset (PSVI). The PSVI gives a valid XML document its "type" and facilitates treating the document as an object, using object-oriented programming (OOP) paradigms.

Example

This is an example of a rather simple schema document to describe an address.

<?xml version="1.0" encoding="utf-8"?>
<xs:schema elementFormDefault="qualified" xmlns:xs="http://www.w3.org/2001/XMLSchema">
  <xs:element name="Address">
    <xs:complexType>
      <xs:sequence>
        <xs:element name="Recipient" type="xs:string" />
        <xs:element name="House" type="xs:string" />
        <xs:element name="Street" type="xs:string" />
        <xs:element name="Town" type="xs:string" />
        <xs:element name="County" type="xs:string" minOccurs="0" />
        <xs:element name="PostCode" type="xs:string" />
        <xs:element name="Country">
          <xs:simpleType>
            <xs:restriction base="xs:string">
              <xs:enumeration value="IN" />
              <xs:enumeration value="DE" />
              <xs:enumeration value="ES" />
              <xs:enumeration value="UK" />
              <xs:enumeration value="US" />
            </xs:restriction>
          </xs:simpleType>
        </xs:element>
      </xs:sequence>
    </xs:complexType>
  </xs:element>
</xs:schema>

A number of development tools can be used to create a graphical representation of a schema. Many of them create diagrams similar to the one shown below:

An example of an XML document that conforms to this schema

<?xml version="1.0" encoding="utf-8"?>
<Address xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
         xsi:noNamespaceSchemaLocation="SimpleAddress.xsd">
  <Recipient>Mr. Walter C. Brown</Recipient>
  <House>49</House>
  <Street>Featherstone Street</Street>
  <Town>LONDON</Town>
  <PostCode>EC1Y 8SY</PostCode>
  <Country>UK</Country>
</Address>

Secondary uses for XML Schemas

The primary reason for defining an XML schema is to formally describe an XML document; however the resulting schema has a number of other uses that go beyond simple validation.

Code generation

The schema can be used to generate code, referred to as XML Data Binding. This code allows contents of XML documents to be treated as objects within the programming environment.

Generation of XML file structure documentation

The schema can be used to generate human-readable documentation of an XML file structure; this is especially useful where the authors have made use of the annotation elements. No formal standard exists for documentation generation, but a number of tools are available, such as the Xs3p stylesheet, that will produce high quality readable HTML and printed material.

Criticism

Although XML Schema is successful in that it has been widely adopted and largely achieves what it set out to achieve, it has been the subject of a great deal of severe criticism, perhaps more so than any other W3C Recommendation.

A good summary of the criticisms is provided by James Clark[2] (who promotes his own alternative, RELAX NG):

Version 1.1

As of July 2011, XSD 1.1 is a Candidate Recommendation, which means it is in the final review phase before becoming an approved W3C specification. This requires completion of conformance testing reports from two implementations. (Xerces and Saxon have both released implementations that are around 90% complete.)

Significant new features in XSD 1.1 are:

Until the latest draft, XSD 1.1 also proposed the addition of a new numeric data type, precisionDecimal. This proved controversial, and was therefore dropped from the specification at a late stage of development.

See also

Further reading

References

External links

W3C XML Schema 1.0 Specification

W3C XML Schema 1.1 Specification

Other